Method of answering questions and scoring answers using structured knowledge mined from a corpus of data

ABSTRACT

In a method of answering questions and scoring answers, a title and at least one topical field are identified for a document. A field name and field content associated with the topical field is identified, and a title-oriented document is created by combining the title, the field name, and the field content associated with the topical field. For each title-oriented document, a term in the title is matched to previously established categories to produce a title concept identifier. The topical field is synthesized to produce a field concept identifier and a field content concept identifier. A question is received. The question topic term and the question content identifier are used to identify at least one question-matching relation instance. The title concept identifier of each question-matching relation instance is identified as a candidate answer to the question. Each candidate answer and a corresponding answer score is output.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. application Ser. No.13/710,509 filed Dec. 11, 2012, the entire teachings of which areincorporated herein by reference, in their entirety.

BACKGROUND

The present disclosure relates generally to document management softwareand question answer generation. Specifically, this disclosure providesfor the conversion of unstructured data into structured data. Thestructured data is used to provide a score for an answer to a questionbased on instances that an entry appears in the unstructured document.

Structured knowledge bases may provide precise answers to questions.However, they are expensive to construct because people normally mustmanually code them. Embodiments herein provide a system and method forautomatically constructing structured knowledge bases that can be usedto answer questions precisely. According to embodiments herein, thestructured knowledge bases may be constructed automatically from anatural language corpus, saving much time and expense, yet providingprecise answers.

SUMMARY

According to an embodiment herein, a method of answering questions isdisclosed. According to the method, a question is received into acomputerized question-answer system operating on a computerized device.A plurality of title-oriented documents is automatically constructed bythe computerized device from a corpus of data. The corpus of datacomprises natural language documents. Each title-oriented documentcomprises a title and at least one topical field. The topical fieldcomprises a field name and field content associated with the topicalfield of at least one document in the corpus of data. At least onerelation instance is automatically created using the computerized deviceby combining a field identifier for the topical field, a title conceptidentifier, and a corresponding field content concept identifier. Termsin the question are analyzed, using the computerized device, to identifyat least one question topic term and at least one question contentidentifier based on previously established question term categories. Thequestion topic term and the question content identifier areautomatically compared to the at least one relation instance, using thecomputerized device, to identify at least one question-matching relationinstance. The computerized device automatically generates at least oneanswer to the question by identifying the title concept identifier ofeach question-matching relation instance as a candidate answer to thequestion.

According to another embodiment herein, a method of answering questionsand scoring answers is disclosed. According to the method, a title andat least one topical field is identified for at least a portion of atleast one document in a grouping of documents. A field name and fieldcontent associated with the topical field are identified. At least onetitle-oriented document is created by combining the title, the fieldname, and the field content associated with the at least one topicalfield of the at least a portion of the at least one document. For eachtitle-oriented document, at least one term in the title that correspondsto previously established categories is identified to produce at leastone title concept identifier. The topical field is synthesized toproduce at least one field concept identifier and at least one fieldcontent concept identifier from the previously established categories.At least one relation instance is created by combining, for each fieldconcept identifier, a corresponding title concept identifier of the atleast one title concept identifier and a corresponding field contentconcept identifier of the at least one field content concept identifier.A count for each relation instance is calculated based on a number ofoccurrences of the title concept identifier and the field contentconcept identifier within a corresponding one of the at least onedocuments in the grouping of documents. The count is included with eachrelation instance. A question is received. The terms in the question areanalyzed to identify at least one question topic term and at least onequestion content identifier based on previously established questionterm categories. The question topic term and the question contentidentifier are compared to at least one of the at least one relationinstance to identify at least one question-matching relation instance.The title concept identifier of each question-matching relation instanceis identified as a candidate answer to the question. Each count withineach relation instance corresponding to each candidate answer is addedto provide an answer score for each candidate answer. Each candidateanswer and a corresponding answer score are output.

According to another embodiment herein, a method of scoring answers toquestions in a question-answering system is disclosed. A question isreceived into a computerized question-answer system operating on acomputerized device. A plurality of title-oriented documents isautomatically constructed by the computerized device from a corpus ofdata. The corpus of data comprises natural language documents. Eachtitle-oriented document comprises a title and at least one topicalfield. The topical field comprises a field name and field contentassociated with the topical field of at least one document in the corpusof data. At least one relation instance is automatically created usingthe computerized device by combining a field identifier for the topicalfield, a title concept identifier, and a corresponding field contentconcept identifier. A count for each relation instance is automaticallycalculated, using the computerized device, based on the number ofoccurrences of the title concept identifier and the field contentconcept identifier within a corresponding one of the at least onedocuments. Terms in the question are analyzed, using the computerizeddevice, to identify at least one question topic term and at least onequestion content identifier based on previously established questionterm categories. The question topic term and the question contentidentifier are automatically compared to at least one of the at leastone relation instance, using the computerized device, to identify atleast one question-matching relation instance. The computerized deviceautomatically generates at least one answer to the question byidentifying the title concept identifier of each question-matchingrelation instance as a candidate answer to the question. Thecomputerized device generates a score for the candidate answers byadding each count within each relation instance corresponding to eachcandidate answer.

According to another embodiment herein, a computer-implemented methodfor automatically generating answers to questions is disclosed.According to the method, a title and at least one topical field for atleast a portion of at least one document in a corpus of data comprisingnatural language documents is identified using a computerized device. Aplurality of title-oriented documents is constructed from the corpus ofdata, by the computerized device. Each title-oriented document comprisesa title and at least one topical field. The topical field comprises afield name and field content associated with the topical field of the atleast one document in the corpus of data. The computerized deviceconstructs a structured knowledge base from the plurality oftitle-oriented documents. The structured knowledge base comprises atleast one relation instance comprising a field identifier for thetopical field, the title concept identifier, and a corresponding fieldcontent concept identifier. A question to be answered by a QuestionAnswering (QA) system is provided to a processor of the computerizeddevice. The processor analyzes terms in the question to identify atleast one question topic term and at least one question contentidentifier based on previously established question term categories. Theprocessor creates a collection of candidate answers to the question fromthe structured knowledge base.

According to another embodiment herein, a computerized device isdisclosed. The computerized device includes a processor comprising aquestion-answer system performing a plurality of question answeringprocesses, a user interface operatively connected to the processor thatreceives a question into the question-answer system, and a tangiblestorage device operatively connected to the processor storing a corpusof data comprising a plurality of natural language documents. Theprocessor automatically constructs a plurality of title-orienteddocuments from the corpus of data. Each title-oriented documentcomprises a title and at least one topical field. The topical fieldcomprises a field name and field content associated with the topicalfield of at least one document in the corpus of data. The processorautomatically creates at least one relation instance by combining afield identifier for the topical field, a title concept identifier, anda corresponding field content concept identifier. Terms in the questionare analyzed to identify at least one question topic term and at leastone question content identifier based on previously established questionterm categories. The processor automatically compares the question topicterm and the question content identifier to at least one of the at leastone relation instance to identify at least one question-matchingrelation instance. At least one answer to the question is automaticallygenerated by identifying the title concept identifier of eachquestion-matching relation instance as a candidate answer to thequestion.

According to another embodiment herein, a computer system for scoringanswers to questions in a question-answering system is disclosed. Thesystem comprises a processor comprising an automated question answering(QA) system comprising a tangible storage device operatively connectedto the processor storing a corpus of data comprising a plurality ofnatural language documents and a user interface operatively connected tothe processor receiving a question into the automated QA system. Theprocessor automatically constructs a plurality of title-orienteddocuments from the corpus of data. Each title-oriented documentcomprises a title and at least one topical field. The topical fieldcomprises a field name and field content associated with the topicalfield of at least one document in the corpus of data. The processorautomatically creates at least one relation instance by combining afield identifier for the topical field, a title concept identifier, anda corresponding field content concept identifier. The processorautomatically calculates a count for each relation instance based on anumber of occurrences of the title concept identifier and the fieldcontent concept identifier within a corresponding one of the at leastone documents. The processor automatically analyzes terms in thequestion to identify at least one question topic term and at least onequestion content identifier based on previously established questionterm categories. The processor automatically compares the at least onequestion topic term and the at least one question content identifier toat least one of the at least one relation instance to identify at leastone question-matching relation instance. The processor automaticallygenerates at least one answer to the question by identifying the titleconcept identifier of each question-matching relation instance as acandidate answer to the question. The processor generates a score forthe candidate answers by adding each count within each relation instancecorresponding to each candidate answer.

According to another embodiment, a question answering (QA) system isdisclosed. The QA system comprises a processor comprising a named-entityextractor, a first interface connected to the processor, a secondinterface connected to the processor, and a non-transitory storagedevice operatively connected to the processor storing a corpus of datacomprising a plurality of natural language documents. The firstinterface receives a question to be answered by the QA system. Theprocessor automatically identifies a title and at least one topicalfield of at least a portion of at least one document in the corpus ofdata, using the named-entity extractor. The processor automaticallyconstructs a plurality of title-oriented documents from the corpus ofdata. Each title-oriented document comprises the title and at least oneof the at least one topical field. The topical field comprises a fieldname and field content associated with the topical field of at least aportion of at least one document in the corpus of data. The processorautomatically constructs a structured knowledge base from the pluralityof title-oriented documents. The structured knowledge base comprises atleast one relation instance comprising a field identifier for thetopical field, a title concept identifier, and a corresponding fieldcontent concept identifier. The processor automatically analyzes termsin the question to identify at least one question topic term and atleast one question content identifier based on previously establishedquestion term categories. The processor automatically creates acollection of candidate answers to the question. The collection ofcandidate answers is created from the structured knowledge base. Theprocessor automatically outputs each candidate answer through the secondinterface.

According to another embodiment herein, a computer program product foranswering questions in a question-answering system is disclosed. Thecomputer program product comprises a computer readable storage mediumhaving program code embodied therewith. The program code is readable andexecutable by a computer to perform a method. According to the method, atitle and at least one topical field is automatically identified for atleast a portion of at least one document in a grouping of documents. Afield name and field content associated with the topical field isautomatically identified. At least one title-oriented document isautomatically created by combining the title, the field name, and thefield content associated with the at least one topical field of the atleast a portion of the at least one document. For each title-orienteddocument, at least one term in the title that corresponds to previouslyestablished categories is automatically identified to produce at leastone title concept identifier. For each title-oriented document, thetopical field is automatically synthesized to produce at least one fieldconcept identifier and at least one field content concept identifierfrom the previously established categories. At least one relationinstance is automatically created by combining, for each field conceptidentifier, a corresponding title concept identifier of the at least onetitle concept identifier and a corresponding field content conceptidentifier of the at least one field content concept identifier. A countfor each relation instance is automatically calculated based on a numberof occurrences of the title concept identifier and the field contentconcept identifier within a corresponding one of the at least onedocuments in the grouping of documents. The count is automaticallyincluded with each relation instance. A question is received. The termsin the question are automatically analyzed to identify at least onequestion topic term and at least one question content identifier basedon previously established question term categories. The question topicterm and the question content identifier are automatically compared toat least one of the at least one relation instance to identify at leastone question-matching relation instance. The title concept identifier ofeach question-matching relation instance is automatically identified asa candidate answer to the question. Each count within each relationinstance corresponding to each candidate answer is automatically addedto provide an answer score for each candidate answer. Each candidateanswer and a corresponding answer score are automatically output.

BRIEF DESCRIPTION OF THE DRAWINGS

The embodiments herein will be better understood from the followingdetailed description with reference to the drawings, which are notnecessarily drawn to scale and in which:

FIG. 1 is a block diagram illustrating various aspects of embodimentsherein;

FIG. 2 is a block diagram illustrating various aspects of embodimentsherein;

FIG. 3 is a block diagram illustrating various aspects of embodimentsherein;

FIG. 4 is a block diagram illustrating various aspects of embodimentsherein;

FIG. 5 is a block diagram illustrating various aspects of embodimentsherein;

FIG. 6 is a flow diagram illustrating embodiments herein;

FIG. 7 is a schematic diagram of a hardware system according toembodiments herein;

FIG. 8 is a schematic diagram of a deployment system according toembodiments herein;

FIG. 9 is a schematic diagram of an integration system according toembodiments herein;

FIG. 10 is a schematic diagram of an on demand system according toembodiments herein;

FIG. 11 is a schematic diagram of a virtual private network systemaccording to embodiments herein; and

FIG. 12 is a schematic diagram of a virtual private network systemaccording to embodiments herein.

DETAILED DESCRIPTION

It will be readily understood that the embodiments of the presentdisclosure, as generally described and illustrated in the figuresherein, may be arranged and designed in a wide variety of differentconfigurations in addition to the embodiments described herein. Thus,the following detailed description of the embodiments, as represented inthe figures, is not intended to limit the scope of the invention, asclaimed, but is merely representative of selected embodiments. Thefollowing description is intended only by way of example, and simplyillustrates certain embodiments of the invention, as claimed herein.

Referring to FIG. 1, according to one embodiment, a novel method ofautomatically constructing structured knowledge bases that can be usedto answer questions precisely is disclosed. First, title-orienteddocuments (TODs) are created from a natural language corpus 111. Astructured knowledge base is constructed from the TODs 121. To do this,relations are constructed between concepts in the titles of thedocuments and concepts in the bodies of the documents from the naturallanguage corpus. As used herein, ‘concepts’ are formed by synthesizingthe information in the title and/or text body to a representativesimpler word or phrase, which is used as a concept identifier. To answerquestions 131, concepts that occur in the relations constructed betweenthe concepts in the titles of the documents and the concepts in thebodies of the documents are matched with concepts that also occur in thequestion. For scoring the answers to the questions 141, a count is madeof the number of times the answer occurs in a concept that occurs in aconstructed relation and that also has a concept that occurs in thequestion.

FIG. 2 shows an example for constructing a TOD. For each document in acorpus of documents about a subject matter of interest, the document 212is converted into one or more title-oriented documents (TODs) 222. A TOD222 includes a title and one or more topical fields associated with thetitle. Each topical field has a field name and field content, whichincludes at least a portion of the text derived from the document.

An example TOD may include:

TITLE: Rheumatoid arthritis

FIELD NAME: SYMPTOMS

FIELD CONTENT: Symptoms include inflammation and increased stiffness.

FIELD NAME: TREATMENT

FIELD CONTENT: Treatments include analgesics and NSAIDS.

The TOD title is created from the titles or headings 232 within thecorpus documents. TOD fields include field names 242 and associatedfield content 252, and are created from headings and text in the corpusdocuments using a named-entity extractor. A named-entity extractorlocates and classifies elements in the text into predefined categoriessuch as the names of persons, locations, names of diseases, syndromes,conditions, diagnoses, symptoms, signs, findings, laboratory testresults, quantities, organizations, expressions of times, monetaryvalues, percentages, etc. (The preceding list is intended to includenon-limiting examples of predefined categories.) While creating the TOD,variant terms used in the corpus document headings (such as “Signs andSymptoms”, “Symptoms and Signs”, and “Symptoms”) should be mapped to TODfield names according to their appropriate synonyms (such as “SYMPTOMS”,as shown at 242).

In FIG. 3, a structured knowledge base 323 is constructed from aplurality of TODs 222, 333. The structured knowledge base 323 comprisesa set of relation instances 343 derived from the TODs 222, 333. Arelation instance 343 may take the form (field concept identifier,unique concept identifier, unique concept identifier). Each relationinstance 343 is created by combining a field concept identifier for thetopical field, typically the field name, a title concept identifier, anda field content concept identifier corresponding to the title concept.The structured knowledge base 323 may also include a map from eachrelation instance 343 to a count 353, which is described in more detailbelow.

The structured knowledge base 323 may be constructed as follows:

For each TOD 222, 333 (1) perform named entity recognition on the title232 to obtain one or more unique concept identifiers for the title,sometimes referred to herein as a title unique concept identifier (TUCI)363; (2) for each field in the TOD 222, 333, perform named entityrecognition on the field content 252 to obtain one or more uniqueconcept identifiers for the field content, sometimes referred to hereinas a field unique concept identifier (FUCI) 373. For each title uniqueconcept identifier (TUCI) 363 and for each field unique conceptidentifier (FUCI) 373, increment the count 353 of the structuredknowledge base relation instance (FIELD NAME,TUCI,FUCI) 343.

In some embodiments, the count can be determined based on confidence inthe unique concept identifier. A “confidence” value is a real number(usually between 0.0 and 1.0) that is produced by the named entityrecognizer. A confidence value of 1.0 indicates that the named entityrecognizer is 100% confident that the text is referring to the uniqueconcept identifier. A confidence value of 0.0 indicates that the namedentity recognizer is 0% confident that the text is referring to theunique concept identifier. For example, when constructing the structuredknowledge base, instead of incrementing the count by 1 for eachoccurrence of a term in the field content, the count can be incrementedby [the confidence of the TUCI] multiplied by [the confidence of theFUCI]. Therefore, the “count” in the structured knowledge base may nolonger be an integer count, but rather a real number. In this case, itmay be referred to as a “weighted count.”

Referring to FIG. 4, the structured knowledge base 323 may be used togenerate answers to questions. The terms in the question are analyzed toidentify at least one question content identifier and at least onequestion topic term. Named entity recognition is performed on thequestion 414 to obtain one or more unique concept identifiers for thequestion 414, sometimes referred to herein as a question unique conceptidentifier (QUCI). In the example shown in FIG. 4, one question uniqueconcept identifier may be related to “increased stiffness” 424 andanother question unique concept identifier may be related to“inflammation” 434. Named entity recognition performed on the question414 also obtains a question topic term, which is similar to a questionunique concept identifier. In the example shown in FIG. 4, the questiontopic term may be related to “disease” 444.

According to embodiments herein, the question topic term 444 is comparedwith the structured knowledge base relation instance (FIELDNAME,TUCI,FUCI) 343 in order to search in the appropriate field. When aquestion unique concept identifier (QUCI) 424, 434 in the question 414correlates with a field unique concept identifier (FUCI) 373 in therelation instance, the title unique concept identifier (TUCI) 363 may bea potential answer 454.

In some embodiments, the title unique concept identifier of eachquestion-matching relation instance may be identified as a candidateanswer to the question. However, according to embodiments herein, it ispossible to apply the relation instances “in reverse”. That is, it isalso possible to identify the field unique concept identifier of aquestion-matching relation instance as a candidate answer to thequestion, where the title unique concept identifier of the relationinstance appears in the question.

For each (FIELD NAME,UCI1,UCI2):

-   -   (1) If the first unique concept identifier (UCI1) is a question        unique concept identifier (QUCI), generate the second unique        concept identifier (UCI2) as a potential answer.    -   (2) If the second unique concept identifier (UCI2) is a question        unique concept identifier (QUCI), generate the first unique        concept identifier (UCI1) as a potential answer.

Answers can also be made more precise by filtering answers.

-   -   (1) If it is known what type of answer is desired, filter        candidate answers by question type, and only generate answers        that are of that type. For example, if it is known that the        question is asking for a disease, only generate answers that are        diseases.    -   (2) If it is known what type of question is being asked, filter        candidate answers by relation type, and only use relations        instances whose field name corresponds to that type of question.        For example, if it is known that the question is asking for a        disease given its symptoms, only use relation instances whose        FIELD NAME is “SYMPTOMS”.

FIG. 5 shows an example for how to score potential answers 454 toquestions 414. Answers to questions may come from sources other than thestructured knowledge base. The structured knowledge base 323 can be usedto score answers that come from the structured knowledge base 323, aswell as answers that come from other sources.

As described above, perform named-entity recognition on the question 414to obtain one or more unique concept identifiers 424, 434 for thequestion. Given a unique concept identifier for an answer to thequestion, sometimes referred to herein as an answer unique conceptidentifier (AUCI) 515, score that answer by computing the sum of thecounts 353 associated with relation instances 343 of the form (FIELDNAME,AUCI,QUCI) or (FIELD NAME,QUCI,AUCI), where QUCI is one of theunique concept identifiers for the question 434. In the example shown inFIG. 5, the QUCI for “inflammation” 434 matches the AUCI in the firstline of the structured knowledge base 323 with a count 353 oftwenty-three (23). The QUCI for “increased stiffness” 424 matches theAUCI in the second line of the structured knowledge base 323 with acount 353 of twelve (12). Return the sum of counts as the score of theanswer. In this example, the score 525 for the answer 454 is the sum of23+12; therefore, the score 525 is thirty-five (35).

Alternatively, compute the sum of the pointwise mutual informationbetween the AUCI and QUCI, where QUCI is one of the unique conceptidentifiers for the question. In this case, return this sum of thepointwise mutual information as the score of the answer. Pointwisemutual information is a measure of association between the uniqueconcept identifiers. That is, pointwise mutual information betweenunique concept identifiers UCI1 and UCI2 is defined as the logarithmbase 2 of the probability that UCI1 and UCI2 co-occur divided by theprobability of UCI1 times the probability of UCI2.

In embodiments where the count is a weighted count, the score iscomputed in a similar fashion. That is, the weighted counts are treatedexactly the same way as un-weighted counts in the pointwise mutualinformation calculation.

Furthermore, a confidence value can be used in calculating an answerscore. For example, when calculating the score for counts or forpointwise mutual information, instead of computing the sum of the counts(or pointwise mutual informations, or weighted counts, or pointwisemutual informations computed from weighted counts), compute the sum ofthe counts (or pointwise mutual information, or weighted counts, orpointwise mutual informations computed from weighted counts), each ofwhich is multiplied by the confidence associated with the matchingunique concept identifier for the question, as produced by the namedentity recognizer when run on the question text. a formula for using aconfidence value to calculate an answer score may include:

[FinalScore of AUCI]=[Confidence of QUCI1]*[WeightedCount of (FIELDNAME,QUCI1,AUCI)]+[Confidence of QUCI2]*[WeightedCount of (FIELDNAME,QUCI2,AUCI)]+ . . .

where QUCI1 and QUCI2 appear in the question, according to the namedentity recognizer and the “Confidence of QUCI1” is the named entityrecognizer's confidence that QUCI1 is actually being referred to in thequestion.

Scores can also be made more precise by filtering the relation instancesused to compute the sum as described above if it is known what type ofanswer is desired or what type of question is being asked.

According to embodiments herein, the system uses natural languagegeneration to convert unstructured data into useable knowledge/model.From this knowledge, the system generates and scores answers toquestions. The safer planet framework can be used to create TODs and thestructured knowledge base. According to embodiments herein, the methodconverts documents into a structured knowledge base. The methodcomprises constructing relation instances between (A) named entitiesrecognized in the title within documents and (B) named entitiesrecognized in the text of those documents. The relation name isconstructed based on what section of the document the particular namedentity that has been recognized in the text (B) appears in. The numberof occurrences of that named entity (B) is counted for use in scoringthe answers to questions. The count is also used to compute pointwisemutual information to score answers to questions.

FIG. 6 is a flow diagram illustrating the processing flow of anexemplary method of answering questions and scoring answers usingstructured knowledge mined from a corpus according to embodimentsherein. In item 600, a title and at least one topical field isidentified for at least a portion of a document in a grouping ofdocuments. At 607, a field name and field content associated with thetopical field is identified. At least one title-oriented document iscreated, at 614, by combining the title, the field name, and the fieldcontent associated with the at least one topical field of the at least aportion of the document. At 621, for each title-oriented document, atleast one term in the title that corresponds to previously establishedcategories is identified to produce at least one title conceptidentifier. At 628, the topical field is synthesized to produce at leastone field concept identifier and at least one field content conceptidentifier from the previously established categories. At least onerelation instance is created, at 635, by combining, for each fieldconcept identifier, a corresponding title concept identifier of the atleast one title concept identifier and a corresponding field contentconcept identifier of the at least one field content concept identifier.At 642, a count for each relation instance is calculated based on thenumber of occurrences of the title concept identifier and the fieldcontent concept identifier within a corresponding one of the documentsin the grouping of documents. The count is included with each relationinstance, at 649. At 656, a question is received. The terms in thequestion are analyzed, at 663, to identify at least one question topicterm. In some embodiments, one or more terms in the question mayoptionally be matched to previously established categories to produce aquestion concept identifier, at 670. At 677, the question topic term andthe question content identifier are matched to at least one of the atleast one relation instance to identify at least one question-matchingrelation instance. At 681, the title concept identifier of eachquestion-matching relation instance is identified as a candidate answerto the question. Each count within each relation instance correspondingto each candidate answer is added, at 691, to provide an answer scorefor each candidate answer. At 698, each candidate answer and eachcorresponding answer score are output.

Aspects of the present disclosure are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to variousembodiments. It will be understood that each block of the flowchartillustrations and/or two-dimensional block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, can beimplemented by computer program instructions. The computer programinstructions may be provided to a processor of a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructions,which execute via the processor of the computer or other programmabledata processing apparatus, create means for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

According to a further embodiment herein, an article of manufacture isprovided that includes a tangible computer readable medium havingcomputer readable instructions embodied therein for performing the stepsof the computer implemented methods, including but not limited to themethod illustrated in FIG. 6. Any combination of one or more computerreadable non-transitory medium(s) may be utilized. The computer readablemedium may be a computer readable signal medium or a computer readablestorage medium. The non-transitory computer storage medium storesinstructions, and a processor executes the instructions to perform themethods described herein. A computer readable storage medium may be, forexample, but not limited to, an electronic, magnetic, optical,electromagnetic, infrared, or semiconductor system, apparatus, ordevice, or any suitable combination of the foregoing. Any of thesedevices may have computer readable instructions for carrying out thesteps of the methods described above with reference to FIG. 6.

The computer program instructions may be stored in a computer readablemedium that can direct a computer, other programmable data processingapparatus, or other devices to function in a particular manner, suchthat the instructions stored in the computer readable medium produce anarticle of manufacture including instructions which implement thefunction/act specified in the flowchart and/or block diagram block orblocks.

Furthermore, the computer program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other devicesto cause a series of operational steps to be performed on the computer,other programmable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

In case of implementing the embodiments herein by software and/orfirmware, a program constituting the software may be installed into acomputer with dedicated hardware, from a storage medium or a network,and the computer is capable of performing various functions if withvarious programs installed therein.

A representative hardware environment for practicing the embodimentsherein is depicted in FIG. 7. This schematic drawing illustrates ahardware configuration of an information handling/computer system inaccordance with the embodiments herein. The system comprises at leastone processor or central processing unit (CPU) 710. The CPUs 710 areinterconnected via system bus 712 to various devices such as a RandomAccess Memory (RAM) 714, Read-Only Memory (ROM) 716, and an Input/Output(I/O) adapter 718. The I/O adapter 718 can connect to peripheraldevices, such as disk units 711 and tape drives 713, or other programstorage devices that are readable by the system. The system can read theinventive instructions on the program storage devices and follow theseinstructions to execute the methodology of the embodiments herein.

In FIG. 7, CPUs 710 perform various processing based on a program storedin a Read Only Memory (ROM) 716 or a program loaded from a peripheraldevice, such as disk units 711 and tape drives 713 to a Random AccessMemory (RAM) 714. In the RAM 714, required data when the CPU 710performs the various processing or the like is also stored as necessary.The CPU 710, the ROM 716, and the RAM 714 are connected to one anothervia a bus 712. An Input/Output adapter 718 is also connected to the bus712 to provide an Input/Output interface, as necessary. A removablemedium, such as a magnetic disk, an optical disk, a magneto-opticaldisk, a semiconductor memory, or the like, is installed on theperipheral device, as necessary, so that a computer program readtherefrom may be installed into the RAM 714, as necessary.

The system further includes a user interface adapter 719 that connects akeyboard 715, mouse 717, speaker 724, microphone 722, and/or other userinterface devices such as a touch screen device (not shown) to the bus712 to gather user input. Additionally, a communication adapter 720including a network interface card such as a LAN card, a modem, or thelike connects the bus 712 to a data processing network 725. Thecommunication adapter 720 performs communication processing via anetwork such as the Internet. A display adapter 721 connects the bus 712to a display device 723, which may be embodied as an output device suchas a monitor (such as a Cathode Ray Tube (CRT), a Liquid Crystal Display(LCD), or the like), printer, or transmitter, for example.

In the case where the above-described series of processing isimplemented with software, the program that constitutes the software maybe installed from a network such as the Internet or a storage mediumsuch as the removable medium.

Those skilled in the art would appreciate that, the storage medium isnot limited to the peripheral device having the program stored thereinas illustrated in FIG. 5, which is distributed separately from thedevice for providing the program to the user. Examples of a removablemedium include a magnetic disk (including a floppy disk), an opticaldisk (including a Compact Disk-Read Only Memory (CD-ROM) and a DigitalVersatile Disk (DVD)), a magneto-optical disk (including a Mini-Disk(MD) (registered trademark)), and a semiconductor memory. Alternatively,the storage medium may be the ROM 716, a hard disk contained in thestorage section 711, or the like, which has the program stored thereinand is distributed to the user together with the device that containsthem.

As will be appreciated by one skilled in the art, aspects of the systemsand methods herein may be embodied as a system, method, or computerprogram product. Accordingly, aspects of the present disclosure may takethe form of an entirely hardware embodiment, an entirely softwareembodiment (including firmware, resident software, micro-code, etc.) oran embodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present disclosure may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable non-transitorymedium(s) may be utilized. The computer readable medium may be acomputer readable signal medium or a computer readable storage medium.The non-transitory computer storage medium stores instructions, and aprocessor executes the instructions to perform the methods describedherein. A computer readable storage medium may be, for example, but notlimited to, an electronic, magnetic, optical, electromagnetic, infrared,or semiconductor system, apparatus, or device, or any suitablecombination of the foregoing. More specific examples (a non-exhaustivelist) of the computer readable storage medium include the following: anelectrical connection having one or more wires, a portable computerdiskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory(ROM), an erasable programmable Read-Only Memory (EPROM or Flashmemory), an optical fiber, a magnetic storage device, a portable compactdisc Read-Only Memory (CD-ROM), an optical storage device, a“plug-and-play” memory device, like a USB flash drive, or any suitablecombination of the foregoing. In the context of this document, acomputer readable storage medium may be any tangible medium that cancontain, or store a program for use by or in connection with aninstruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent disclosure may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++, or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer, or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments herein. In this regard, each block in the flowchart or blockdiagrams may represent a module, segment, or portion of code, whichcomprises one or more executable instructions for implementing thespecified logical function(s). It should also be noted that, in somealternative implementations, the functions noted in the block may occurout of the order noted in the figures. For example, two blocks shown insuccession may, in fact, be executed substantially concurrently, or theblocks may sometimes be executed in the reverse order, depending uponthe functionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts, or combinations of special purpose hardware andcomputer instructions.

Deployment types include loading directly in the client, server, andproxy computers via loading a storage medium such as a CD, DVD, etc. Theprocess software may also be automatically or semi-automaticallydeployed into a computer system by sending the process software to acentral server or a group of central servers. The process software isthen downloaded into the client computers that will execute the processsoftware. The process software is sent directly to the client system viae-mail. The process software is then either detached to a directory orloaded into a directory by a button on the e-mail that executes aprogram that detaches the process software into a directory.Alternatively, the process software is sent directly to a directory onthe client computer hard drive. When there are proxy servers, theprocess will select the proxy server code, determine on which computersto place the proxy servers' code, transmit the proxy server code, andthen install the proxy server code on the proxy computer. The processsoftware will be transmitted to the proxy server, and then stored on theproxy server.

While it is understood that the process software may be deployed bymanually loading directly in the client, server, and proxy computers vialoading a storage medium such as a CD, DVD, etc., the process softwaremay also be automatically or semi-automatically deployed into a computersystem by sending the process software to a central server or a group ofcentral servers. The process software is then downloaded into the clientcomputers that will execute the process software. Alternatively, theprocess software is sent directly to the client system via e-mail. Theprocess software is then either detached to a directory or loaded into adirectory by a button on the e-mail that executes a program thatdetaches the process software into a directory. Another alternative isto send the process software directly to a directory on the clientcomputer hard drive. When there are proxy servers, the process willselect the proxy server code, determine on which computers to place theproxy servers' code, transmit the proxy server code, and then installthe proxy server code on the proxy computer. The process software willbe transmitted to the proxy server, and then stored on the proxy server.

In FIG. 8, step 800 begins the deployment of the process software. Thefirst thing is to determine if there are any programs that will resideon a server or servers when the process software is executed 801. Ifthis is the case, then the servers that will contain the executables areidentified 909. The process software for the server or servers istransferred directly to the servers' storage via FTP or some otherprotocol, or by copying through the use of a shared file system 910. Theprocess software is then installed on the servers 911.

Next, a determination is made on whether the process software is to bedeployed by having users access the process software on a server orservers 802. If the users are to access the process software on servers,then the server addresses that will store the process software areidentified 803.

A determination is made if a proxy server is to be built 900 to storethe process software. A proxy server is a server that sits between aclient application, such as a Web browser, and a real server. Itintercepts all requests to the real server to see if it can fulfill therequests itself. If not, it forwards the request to the real server. Thetwo primary benefits of a proxy server are to improve performance and tofilter requests. If a proxy server is required, then the proxy server isinstalled 901. The process software is either sent to the servers via aprotocol such as FTP or it is copied directly from the source files tothe server files via file sharing 902. Another embodiment would be tosend a transaction to the servers that contain the process software andhave the server process the transaction, then receive and copy theprocess software to the server's file system. Once the process softwareis stored at the servers, the users, via their client computers, thenaccess the process software on the servers and copy it to their clientcomputers file systems 903. Another embodiment is to have the serversautomatically copy the process software to each client and then run theinstallation program for the process software at each client computer.The users execute the program that installs the process software ontheir client computer 912, and then exit the process 808.

In step 804, a determination is made whether the process software is tobe deployed by sending the process software to users via e-mail. The setof users where the process software will be deployed are identifiedtogether with the addresses of the user client computers 805. Theprocess software is sent via e-mail 904 to each of the users' clientcomputers. The users receive the e-mail 905 and then detach the processsoftware from the e-mail to a directory on their client computers 906.The users execute the program that installs the process software ontheir client computer 912, and then exit the process 808.

Lastly, a determination is made on whether the process software will besent directly to user directories on their client computers 806. If so,the user directories are identified 807. The process software istransferred directly to the users' client computer directory 907. Thiscan be done in several ways such as but not limited to sharing of thefile system directories and then copying from the sender's file systemto the recipient users' file system or alternatively using a transferprotocol such as File Transfer Protocol (FTP). The users access thedirectories on their client file systems in preparation for installingthe process software 908. The users execute the program that installsthe process software on their client computer 912, and then exit theprocess 808.

The process software is integrated into a client, server, and networkenvironment by providing for the process software to coexist withapplications, operating systems, and network operating systems software,and then installing the process software on the clients and servers inthe environment where the process software will function.

The first step is to identify any software on the clients and serversincluding the network operating system where the process software willbe deployed that are required by the process software or that work inconjunction with the process software. This includes the networkoperating system that is software that enhances a basic operating systemby adding networking features.

Next, the software applications and version numbers will be identifiedand compared to the list of software applications and version numbersthat have been tested to work with the process software. Those softwareapplications that are missing or that do not match the correct versionwill be upgraded with the correct version numbers. Program instructionsthat pass parameters from the process software to the softwareapplications will be checked to ensure the parameter lists match theparameter lists required by the process software. Conversely, parameterspassed by the software applications to the process software will bechecked to ensure the parameters match the parameters required by theprocess software. The client and server operating systems including thenetwork operating systems will be identified and compared to the list ofoperating systems, version numbers, and network software that have beentested to work with the process software. Those operating systems,version numbers, and network software that do not match the list oftested operating systems and version numbers will be upgraded on theclients and servers to the required level.

After ensuring that the software, where the process software is to bedeployed, is at the correct version level that has been tested to workwith the process software, the integration is completed by installingthe process software on the clients and servers.

In FIG. 9, step 920 begins the integration of the process software. Thefirst thing is to determine if there are any process software programsthat will execute on a server or servers at 921. If this is not thecase, then integration proceeds to 927. If this is the case, then theserver addresses are identified 922. The servers are checked to see ifthey contain software that includes the operating system (OS),applications, and network operating systems (NOS), together with theirversion numbers, that have been tested with the process software 923.The servers are also checked to determine if there is any missingsoftware that is required by the process software 923.

A determination is made if the version numbers match the version numbersof OS, applications, and NOS that have been tested with the processsoftware 924. If all of the versions match and there is no missingrequired software, the integration continues in 927.

If one or more of the version numbers do not match, then the unmatchedversions are updated on the server or servers with the correct versions925. Additionally, if there is missing required software, then it isupdated on the server or servers 925. The server integration iscompleted by installing the process software 926.

Step 927, which follows either step 921, 924, or 926, determines ifthere are any programs of the process software that will execute on theclients. If no process software programs execute on the clients, theintegration proceeds to 930 and exits. If this not the case, then theclient addresses are identified 928.

The clients are checked to see if they contain software that includesthe operating system (OS), applications, and network operating systems(NOS), together with their version numbers, that have been tested withthe process software 929. The clients are also checked to determine ifthere is any missing software that is required by the process software929.

A determination is made as to whether the version numbers match theversion numbers of OS, applications, and NOS that have been tested withthe process software 931. If all of the versions match and there is nomissing required software, then the integration proceeds to 930 andexits.

If one or more of the version numbers do not match, then the unmatchedversions are updated on the clients with the correct versions 932. Inaddition, if there is missing required software then it is updated onthe clients 932. The client integration is completed by installing theprocess software on the clients 933. The integration proceeds to 930 andexits.

The process software can be stored on a shared file system accessiblefrom one or more servers. The process software is executed viatransactions that contain data and server processing requests that useCPU units on the accessed server. CPU units are units of time such asminutes, seconds, hours on the central processor of the server.Additionally, the assessed server may make requests of other serversthat require CPU units. CPU units are an example that represents onemeasurement of use. Other measurements of use include but are notlimited to network bandwidth, memory usage, storage usage, packettransfers, complete transactions etc. When multiple customers use thesame process software application, their transactions are differentiatedby the parameters included in the transactions that identify the uniquecustomer and the type of service for that customer. All of the CPU unitsand other measurements of use that are used for the services for eachcustomer are recorded. When the number of transactions to any one serverreaches a number that begins to affect the performance of that server,other servers are accessed to increase the capacity and to share theworkload. Likewise, when other measurements of use such as networkbandwidth, memory usage, storage usage, etc. approach a capacity so asto affect performance, additional network bandwidth, memory usage,storage etc. are added to share the workload. The measurements of useused for each service and customer are sent to a collecting server thatsums the measurements of use for each customer for each service that wasprocessed anywhere in the network of servers that provide the sharedexecution of the process software. The summed measurements of use unitsare periodically multiplied by unit costs and the resulting totalprocess software application service costs are alternatively sent to thecustomer and/or indicated on a web site accessed by the customer, whichthen remits payment to the service provider. In another embodiment, theservice provider requests payment directly from a customer account at abanking or financial institution. In another embodiment, if the serviceprovider is also a customer of the customer that uses the processsoftware application, the payment owed to the service provider isreconciled to the payment owed by the service provider to minimize thetransfer of payments.

The process software is shared, simultaneously serving multiplecustomers in a flexible, automated fashion. It is standardized,requiring little customization, and it is scalable, providing capacityon demand in a pay-as-you-go model.

The process software can be stored on a shared file system accessiblefrom one or more servers. The process software is executed viatransactions that contain data and server processing requests that useCPU units on the accessed server. CPU units are units of time such asminutes, seconds, hours on the central processor of the server.Additionally, the assessed server may make requests of other serversthat require CPU units. CPU units are an example that represents onemeasurement of use. Other measurements of use include but are notlimited to network bandwidth, memory usage, storage usage, packettransfers, complete transactions etc.

When multiple customers use the same process software application, theirtransactions are differentiated by the parameters included in thetransactions that identify the unique customer and the type of servicefor that customer. All of the CPU units and other measurements of usethat are used for the services for each customer are recorded. When thenumber of transactions to any one server reaches a number that begins toaffect the performance of that server, other servers are accessed toincrease the capacity and to share the workload. Likewise, when othermeasurements of use such as network bandwidth, memory usage, storageusage, etc. approach a capacity so as to affect performance, additionalnetwork bandwidth, memory usage, storage etc. are added to share theworkload.

The measurements of use used for each service and customer are sent to acollecting server that sums the measurements of use for each customerfor each service that was processed anywhere in the network of serversthat provide the shared execution of the process software. The summedmeasurements of use units are periodically multiplied by unit costs andthe resulting total process software application service costs arealternatively sent to the customer and/or indicated on a web siteaccessed by the customer, which then remits payment to the serviceprovider.

In another embodiment, the service provider requests payment directlyfrom a customer account at a banking or financial institution.

In another embodiment, if the service provider is also a customer of thecustomer that uses the process software application, the payment owed tothe service provider is reconciled to the payment owed by the serviceprovider to minimize the transfer of payments.

In FIG. 10, step 940 begins the On Demand process. A transaction iscreated that contains the unique customer identification, the requestedservice type, and any service parameters that further specify the typeof service 941. The transaction is then sent to the main server 942. Inan On Demand environment, the main server can initially be the onlyserver, then, as capacity is consumed, other servers are added to the OnDemand environment.

The server central processing unit (CPU) capacities in the On Demandenvironment are queried 943. The CPU requirement of the transaction isestimated, then the servers' available CPU capacity in the On Demandenvironment are compared to the transaction CPU requirement to see ifthere is sufficient CPU capacity available in any server to process thetransaction 944. If there is not sufficient server CPU capacityavailable, then additional server CPU capacity is allocated to processthe transaction 948. If there was already sufficient CPU capacityavailable, then the transaction is sent to a selected server 945.

Before executing the transaction, a check is made of the remaining OnDemand environment to determine if the environment has sufficientavailable capacity for processing the transaction. This environmentcapacity consists of such things as, but not limited to, networkbandwidth, processor memory, storage etc. 946. If there is notsufficient available capacity, then capacity will be added to the OnDemand environment 947. Next, the required software to process thetransaction is accessed, loaded into memory, then the transaction isexecuted 949.

The usage measurements are recorded 950. The usage measurements consistof the portions of those functions in the On Demand environment that areused to process the transaction. The usage of such functions as, but notlimited to, network bandwidth, processor memory, storage and CPU cyclesare what is recorded. The usage measurements are summed, multiplied byunit costs, and then recorded as a charge to the requesting customer951. If the customer has requested that the On Demand costs be posted toa web site 952, then they are posted 953.

If the customer has requested that the On Demand costs be sent viae-mail to a customer address 954, then they are sent 955. If thecustomer has requested that the On Demand costs be paid directly from acustomer account 956, then payment is received directly from thecustomer account 957. The last step is to exit the On Demand process958.

The process software may be deployed, accessed and executed through theuse of a virtual private network (VPN), which is any combination oftechnologies that can be used to secure a connection through anotherwise unsecured or untrusted network. The use of VPNs is to improvesecurity and for reduced operational costs. The VPN makes use of apublic network, usually the Internet, to connect remote sites or userstogether. Instead of using a dedicated, real-world connection such asleased line, the VPN uses “virtual” connections routed through theInternet from the company's private network to the remote site oremployee.

The process software may be deployed, accessed, and executed througheither a remote-access or a site-to-site VPN. When using theremote-access VPNs, the process software is deployed, accessed, andexecuted via the secure, encrypted connections between a company'sprivate network and remote users through a third-party service provider.The enterprise service provider (ESP) sets a network access server (NAS)and provides the remote users with desktop client software for theircomputers. The telecommuters can then dial a toll-free number or attachdirectly via a cable or DSL modem to reach the NAS and use their VPNclient software to access the corporate network and to access, download,and execute the process software.

When using the site-to-site VPN, the process software is deployed,accessed, and executed through the use of dedicated equipment andlarge-scale encryption, which are used to connect a company's multiplefixed sites over a public network, such as the Internet.

The process software is transported over the VPN via tunneling, which isthe process of placing an entire packet within another packet andsending it over a network. The protocol of the outer packet isunderstood by the network and both points, called tunnel interfaces,where the packet enters and exits the network.

In FIGS. 11 and 12, step 960 begins the Virtual Private Network (VPN)process. A determination is made to see if a VPN for remote access isrequired 961. If it is not required, then proceed to 962. If it isrequired, then determine if the remote access VPN exists 964.

If it does exist, then proceed to 965. Otherwise, identify the thirdparty provider that will provide the secure, encrypted connectionsbetween the company's private network and the company's remote users976. The company's remote users are identified 977. The third partyprovider then sets up a network access server (NAS) 978 that allows theremote users to dial a toll-free number or attach directly via a cableor DSL modem to access, download, and install the desktop clientsoftware for the remote-access VPN 979.

After the remote access VPN has been built, or if it been previouslyinstalled, the remote users can then access the process software bydialing into the NAS or attaching directly via a cable or DSL modem intothe NAS 965. This allows entry into the corporate network where theprocess software is accessed 966. The process software is transported tothe remote users' desktop over the network via tunneling. That is, theprocess software is divided into packets and each packet, including thedata and protocol, is placed within another packet 967. When the processsoftware arrives at the remote users' desktop, it is removed from thepackets, reconstituted, and executed on the remote users' desktop 968.

A determination is made to see if a VPN for site-to-site access isrequired 962. If it is not required, then proceed to exit the process963. Otherwise, determine if the site-to-site VPN exists 969. If it doesexist, then proceed to 972. Otherwise, install the dedicated equipmentrequired to establish a site-to-site VPN 970. Then build the large-scaleencryption into the VPN 971.

After the site-to-site VPN has been built, or if it had been previouslyestablished, the users access the process software via the VPN 972. Theprocess software is transported to the site users over the network viatunneling 973. That is, the process software is divided into packets andeach packet, including the data and protocol, is placed within anotherpacket 974. When the process software arrives at the remote users'desktop, it is removed from the packets, reconstituted, and executed onthe site users' desktop 975. Proceed to exit the process 963.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of this disclosure.As used herein, the singular forms “a”, “an”, and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescriptions of the various embodiments herein have been presented forpurposes of illustration, but are not intended to be exhaustive orlimited to the embodiments disclosed. Many modifications and variationswill be apparent to those of ordinary skill in the art without departingfrom the scope and spirit of the described embodiments. The terminologyused herein was chosen to best explain the principles of theembodiments, the practical application or technical improvement overtechnologies found in the marketplace, or to enable others of ordinaryskill in the art to understand the embodiments disclosed herein.

What is claimed is:
 1. A computerized device comprising: a processorcomprising a computerized question-answer system performing a pluralityof question answering processes; a user interface operatively connectedto said processor, said user interface receiving a question into saidquestion-answer system; and a tangible storage device operativelyconnected to said processor, said tangible storage device storing acorpus of data comprising a plurality of natural language documents,said processor: automatically constructing a plurality of title-orienteddocuments from said corpus of data, each said title-oriented documentcomprising a title and at least one topical field, said at least onetopical field comprising a field name and field content associated withsaid at least one topical field of at least one document in said corpusof data, automatically creating at least one relation instance bycombining a field identifier for said at least one topical field, atitle concept identifier, and a corresponding field content conceptidentifier, automatically analyzing terms in said question to identifyat least one question topic term and at least one question contentidentifier based on previously established question term categories,automatically comparing said question topic term and said questioncontent identifier to at least one of said at least one relationinstance to identify at least one question-matching relation instance,and automatically generating at least one answer to said question byidentifying said title concept identifier of each said question-matchingrelation instance as a candidate answer to said question.
 2. Thecomputerized device of claim 1, further comprising a named-entityextractor to locate and classify elements in said natural languagedocuments into predefined categories for said title and said field name.3. The computerized device of claim 1, said processor analyzing terms insaid question further comprising: identifying a type of question; andfiltering said candidate answers according to said type of question. 4.The computerized device of claim 1, said processor analyzing terms insaid question further comprising: identifying a type of relation basedon said question; and filtering said relation instances according tosaid type of relation.
 5. The computerized device of claim 4, saidprocessor automatically constructing a plurality of title-orienteddocuments from a corpus of data further comprising: for each saidtitle-oriented document, automatically identifying at least one term insaid title that corresponds to previously established categories, usingsaid computerized question-answer system, to produce at least one titleconcept identifier; and for each said title-oriented document,automatically synthesizing said topical field, using said computerizedquestion-answer system, to produce at least one field concept identifierand at least one field content concept identifier from said previouslyestablished categories.
 6. The computerized device of claim 1, saidprocessor further: automatically calculating a count for each saidrelation instance, using said computerized device, based on a number ofoccurrences of said title concept identifier and said field contentconcept identifier within a corresponding one of said at least onedocument in said corpus of data.
 7. The computerized device of claim 6,said count for each said relation instance further comprising pointwisemutual information.
 8. The computerized device of claim 6, saidprocessor further adding each said count within each said relationinstance corresponding to said candidate answer to provide an answerscore for said candidate answer.
 9. A computer system for scoringanswers to questions in a question-answering system, comprising: aprocessor comprising an automated question answering (QA) systemcomprising: a tangible storage device operatively connected to saidprocessor, said tangible storage device storing a corpus of datacomprising a plurality of natural language documents; and a userinterface operatively connected to said processor, said user interfacereceiving a question into said automated QA system, said processorautomatically constructing a plurality of title-oriented documents fromsaid corpus of data, each said title-oriented document comprising atitle and at least one topical field, said at least one topical fieldcomprising a field name and field content associated with said at leastone topical field of at least one document in said corpus of data, saidprocessor automatically creating at least one relation instance bycombining a field identifier for said at least one topical field, atitle concept identifier, and a corresponding field content conceptidentifier, said processor automatically calculating a count for eachsaid relation instance based on a number of occurrences of said titleconcept identifier and said field content concept identifier within acorresponding one of said at least one document, said processorautomatically analyzing terms in said question to identify at least onequestion topic term and at least one question content identifier basedon previously established question term categories, said processorautomatically comparing said at least one question topic term and saidat least one question content identifier to at least one of said atleast one relation instance to identify at least one question-matchingrelation instance, said processor automatically generating at least oneanswer to said question by identifying said title concept identifier ofeach said question-matching relation instance as a candidate answer tosaid question, and said processor generating a score for said candidateanswer by adding each said count within each said relation instancecorresponding to said candidate answer.
 10. The computer system of claim9, said count for each said relation instance further comprisingpointwise mutual information.
 11. The computer system of claim 9, saidprocessor including said count with each said relation instance.
 12. Thecomputer system of claim 9, further comprising a named-entity extractorto locate and classify elements in said natural language document intopredefined categories for said title and said field name, and saidprocessor automatically constructing a plurality of title-orienteddocuments from said corpus of data further comprising: for each saidtitle-oriented document, automatically identifying at least one term insaid title that corresponds to said predefined categories to produce atleast one title concept identifier; and for each said title-orienteddocument, automatically synthesizing said topical field to produce atleast one field concept identifier and at least one field contentconcept identifier from said predefined categories.
 13. The computersystem of claim 9, said processor further automatically identifying atitle and at least one topical field of at least a portion of a documentin said corpus of data; and automatically identifying field contentassociated with said topical field.
 14. A question answering (QA) systemcomprising: a processor comprising a named-entity extractor; a firstinterface operatively connected to said processor; a second interfaceoperatively connected to said processor; and a non-transitory storagedevice operatively connected to said processor, said non-transitorystorage device storing a corpus of data comprising a plurality ofnatural language documents, said first interface receiving a question tobe answered by said QA system, said processor automatically identifyinga title and at least one topical field of at least a portion of adocument in said corpus of data, using said named-entity extractor, saidprocessor automatically constructing a plurality of title-orienteddocuments from said corpus of data, each said title-oriented documentcomprising said title and at least one topical field, said at least onetopical field comprising a field name and field content associated withsaid at least one topical field of at least one document in said corpusof data, said processor automatically constructing a structuredknowledge base from said plurality of title-oriented documents, saidstructured knowledge base comprising at least one relation instancecomprising a field identifier for said at least one topical field, atitle concept identifier, and a corresponding field content conceptidentifier, said processor automatically analyzing terms in saidquestion to identify at least one question topic term and at least onequestion content identifier based on previously established questionterm categories, said processor automatically creating a collection ofcandidate answers to said question, said collection of candidate answersbeing created from said structured knowledge base, and said processorautomatically outputting each said candidate answer through said secondinterface.
 15. The question answering system of claim 14, said processorconstructing a plurality of title-oriented documents from said corpus ofdata further comprising: for each said title-oriented document,automatically identifying at least one term in said title thatcorresponds to previously established categories, using said QA system,to produce at least one title concept identifier; and for each saidtitle-oriented document, synthesizing said topical field to produce atleast one field concept identifier and at least one field contentconcept identifier from said previously established categories.
 16. Thequestion answering system of claim 14, said processor furthercalculating a count for each said relation instance based on a number ofoccurrences of said title concept identifier and said field contentconcept identifier within a corresponding one of said at least onedocument in said corpus of data comprising natural language documents.17. The question answering system of claim 16, said processor furthergenerating a score for said candidate answers by adding each said countwithin each said relation instance corresponding to said candidateanswers.
 18. The question answering system of claim 14, said processoranalyzing terms in said question further comprising: identifying a typeof question; and filtering said candidate answers according to said typeof question.
 19. The question answering system of claim 14, saidprocessor analyzing terms in said question further comprising:identifying a type of relation based on said question; and filteringsaid relation instances according to said type of relation.
 20. Acomputer program product for answering questions in a question-answeringsystem, said computer program product comprising a computer readablestorage medium having program code embodied therewith, said program codebeing readable and executable by a computer to perform a methodcomprising: automatically identifying a title and at least one topicalfield of at least a portion of a document in a grouping of documents;automatically identifying a field name and field content associated withsaid topical field; automatically creating at least one title-orienteddocument by combining, said title, said field name, and said fieldcontent associated with said at least one topical field of said at leasta portion of said document; for each said title-oriented document,automatically identifying at least one term in said title thatcorresponds to previously established categories to produce at least onetitle concept identifier; for each said title-oriented document,automatically synthesizing said topical field to produce at least onefield concept identifier and at least one field content conceptidentifier from said previously established categories; automaticallycreating at least one relation instance by combining, for each saidfield concept identifier, a corresponding title concept identifier ofsaid at least one title concept identifier and a corresponding fieldcontent concept identifier of said at least one field content conceptidentifier; automatically calculating a count for each said relationinstance based on a number of occurrences of said title conceptidentifier and said field content concept identifier within acorresponding one of said at least one document in said grouping ofdocuments; automatically including said count with each said relationinstance; receiving a question into said computer; automaticallyanalyzing terms in said question to identify at least one question topicterm and at least one question content identifier based on previouslyestablished question term categories; automatically comparing saidquestion topic term and said question content identifier to at least oneof said at least one relation instance to identify at least onequestion-matching relation instance; automatically identifying saidtitle concept identifier of each said question-matching relationinstance as a candidate answer to said question; automatically addingeach said count within each said relation instance corresponding to eachsaid candidate answer to provide an answer score for each said candidateanswer; and automatically outputting each said candidate answer and eachcorresponding said answer score from said computer.
 21. The computerprogram product according to claim 20, said grouping of documentscomprising natural language documents.
 22. The computer program productaccording to claim 20, said automatically identifying at least one termin said title that corresponds to previously established categoriesfurther comprising using a named-entity extractor to locate and classifyelements in said at least one term into predefined categories.
 23. Thecomputer program product according to claim 20, said analyzing terms insaid question further comprising: identifying a type of question; andfiltering said candidate answers according to said type of question. 24.The computer program product according to claim 20, said analyzing termsin said question further comprising: identifying a type of relationbased on said question; and filtering said relation instances accordingto said type of relation.
 25. The computer program product according toclaim 20, said count for each said relation instance further comprisingpointwise mutual information.